Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An alternative implementation to speed up iterating successive DcmList elements using seek_to #114

Conversation

reunanen
Copy link

@reunanen reunanen commented Jan 3, 2025

Problem: Reading a real-world DICOM image was a bit slow, and simple profiling suggested that some 50% of the time (*) was spent in DcmPixelSequence essentially iterating a long list using successive calls to DcmList::seek_to (which has a random-access-like interface, but internally iterates over a doubly linked list)

Solution: In DcmList, keep the current absolute position, essentially catering for O(1) access whenever going to the previous or the next element (NB: truly random access is still O(n), though)

(*): the other 50% was spent decoding jpeg tiles, which is legit use of time of course

…profiling suggested that some 50% of the time (*) was spent in `DcmPixelSequence` essentially iterating a long list using successive calls to `DcmList::seek_to` (which has a random-access-like interface, but internally _iterates_ over a doubly linked list)

Solution: In `DcmList`, keep the current absolute position, essentially catering for O(1) access whenever going to the previous or the next element (NB: truly _random_ access is still O(n), though)

(*): the other 50% was spent decoding jpeg tiles, which is legit use of time of course
@reunanen
Copy link
Author

reunanen commented Jan 3, 2025

This is an alternative implementation to address the problem reported in #113.

// `seek_to` to essentially iterate over the sequence, for example. If so, then
// let's start iterating from the current position. Often the position we want
// is simply the next position (or maybe the previous one). Let's make those
// use cases be O(1), and not O(n).
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the beef of the PR. Basically the complexity (of the updated seek_to method) should be O(min(m,n))), where m is the distance of the desired position from the current position (and n is the number of elements in the list). Note that in the relevant actual use case (DcmPixelSequence), it seems that typically m = 1.

@jriesmeier
Copy link
Member

jriesmeier commented Jan 6, 2025

Thank you for your second approach to deal with this issue. I still wonder how the application accesses the items of the DcmPixelSequence. If it really iterates all items, then using nextObject() or nextInContainer() would be an appropriate solution that does not require any changes to the DCMTK.

In any case, I agree that the performance of DcmList could/should be improved for various use cases. In this context, your new approach looks really promising. I'll put this to the agenda of the next week's DCMTK team meeting.

@reunanen
Copy link
Author

reunanen commented Jan 6, 2025

The application simply constructs a DicomImage object and then repeatedly calls processNextFrames (and getOutputData). So the application does not directly deal with DcmPixelSequences, or anything like that.

For what it's worth, the call stack basically looks like this (line numbers as of the version having tag DCMTK-3.6.9):

DicomImage::processNextFrames(const unsigned long), line 173
 DiRGBImage::processNextFrames(const unsigned long fcount), line 95
  DiImage::processNextFrames(const unsigned long fcount), line 463
   DiImage::convertPixelData(), line 568
    DiInputPixelTemplate<unsigned char,unsigned char>::DiInputPixelTemplate<unsigned char,unsigned char>(const DiDocument * document, const unsigned short alloc, const unsigned short stored, const unsigned short high, const unsigned long first, const unsigned long number, const unsigned long fsize, DcmFileCache * fileCache, unsigned int & fragment), line 158
     DiInputPixelTemplate<unsigned char,unsigned char>::convert(const DiDocument * document, const unsigned short bitsAllocated, const unsigned short bitsStored, const unsigned short highBit, DcmFileCache * fileCache, unsigned int & fragment), line 428
      DcmPixelData::getUncompressedFrame(DcmItem * dataset, unsigned int frameNo, unsigned int & startFragment, void * buffer, unsigned int bufSize, OFString & decompressedColorModel, DcmFileCache * cache), line 1223
       DcmCodecList::decodeFrame(const DcmXfer & fromType, const DcmRepresentationParameter * fromParam, DcmPixelSequence * fromPixSeq, DcmItem * dataset, unsigned int frameNo, unsigned int & startFragment, void * buffer, unsigned int bufSize, OFString & decompressedColorModel), line 509
        DJCodecDecoder::decodeFrame(const DcmRepresentationParameter * fromParam, DcmPixelSequence * fromPixSeq, const DcmCodecParameter * cp, DcmItem * dataset, unsigned int frameNo, unsigned int & startFragment, void * buffer, unsigned int bufSize, OFString & decompressedColorModel), line 545
         DcmPixelSequence::getItem(DcmPixelItem * & item, const unsigned long num), line 272
          DcmList::seek_to(unsigned long absolute_position), line 257

Maybe the main culprit is around here (note that pixSeq->getItem calls DcmList::seek_to), but anyway to me that's inside DCMTK? Please let me know if you'd like me to optimize that part, instead of touching DcmList.

@jriesmeier
Copy link
Member

jriesmeier commented Jan 6, 2025

I see, so the problem lies either in the ugly implementation of the DcmList::seek_to() method or in the ugly use of DcmList::getItem() in the DJCodecDecoder::decodeFrame() method.

As I wrote in my previous comment, we will discuss this internally during our team meeting end of next week. Thanks again.

@reunanen
Copy link
Author

reunanen commented Jan 6, 2025

In any case, I agree that the performance of DcmList could/should be improved for various use cases.

Generally speaking I would say that DcmList should either a) be a regular (doubly) linked list and not offer seek_to in the first place (like std::list, which does not even pretend to support random access), or b) offer seek_to that has reasonable performance.

However, I guess it's hard to remove seek_to at this point (as it may be used here and there), and also it may be hard to make it perform well in general, without compromising insertion complexity for example. (So this PR tries to make it perform well in this special case that is actually used elsewhere in the library.)

@reunanen
Copy link
Author

reunanen commented Jan 6, 2025

As I wrote in my previous comment, we will discuss this internally during our team meeting end of next week.

Thank you. I think in the meanwhile we will proceed with a fork that uses either #113 or this #114 (because our client actually expects us to improve speed on this front – and as said, this is the hot spot currently, in addition to decoding jpeg tiles which of course is expected to consume CPU cycles... and as also said, reading a corresponding Aperio .svs file is much faster at the moment).

So if we find any problems with whichever approach we choose, I will come back here and post our findings.

If you guys later come up with another solution (such as optimizing DJCodecDecoder::decodeFrame instead), we should be able to revert ours and merge yours.

Thanks again for your response!

@jriesmeier
Copy link
Member

Generally speaking I would say that DcmList should either a) be a regular (doubly) linked list and not offer seek_to in the first place (like std::list, which does not even pretend to support random access), or b) offer seek_to that has reasonable performance.

I think this bad design of DcmList is part of the DCMTK from the very beginning, i.e. for more than 30 years now, but it's never too late to fix a real problem.

If you guys later come up with another solution (such as optimizing DJCodecDecoder::decodeFrame instead), we should be able to revert ours and merge yours.

I will keep you informed about the outcome of our discussion.

@reunanen
Copy link
Author

I think in the meanwhile we will proceed

For what it's worth: we've moved on using what corresponds to this PR #114 (and not #113). So far no problems or anything.

@jriesmeier
Copy link
Member

Thank you for the update. This kind of information is really useful for our discussion end of next week.

@@ -155,6 +171,7 @@ DcmObject *DcmList::insert( DcmObject *obj, E_ListPos pos )
node->nextNode = currentNode;
currentNode->prevNode = node;
currentNode = node;
currentAbsolutePosition++;
Copy link
Member

@jriesmeier jriesmeier Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reunanen Isn't this increment wrong as the new node gets the position of the current node?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Not sure what I was thinking. 😬

Added test criteria, and fixed the problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for confirming my observation and also for the updated regression test.

@jriesmeier
Copy link
Member

@reunanen As announced earlier, today we've talked about your two PRs at our DCMTK team meeting. We agreed that PR #114 is the one to incorporate into the "testing" branch of the DCMTK. I'll take your proposed changed, make a few modification (to meet our mainly undocumented "style guides"), test it and the commit the changes. It will take a few days until the changes are visible in DCMTK's public git repository (because of the nightly builds on various platforms and additional integration tests that are performed on the "testing" branch).

Thanks again for your helpful contribution to the DCMTK!

michaelonken pushed a commit that referenced this pull request Jan 18, 2025
Significantly improved the performance of the seek_to() method, in
particular when iterating a very long list using the position of the
items, e.g. a DcmPixelSequence with a huge number of instances of
DcmPixelItem such as in Whole Slide Imaging (WSI) objects. Internally,
the seek_to() method is, e.g., used in the insert(), remove() and
getItem() methods of DcmSequenceOfItems and DcmPixelSequence as well
as in the getElement() and remove() method of DcmItem.

Thanks to GitHub user "reunanen" for the report and proposed patch.

This closes GitHub pull request #114.
@jriesmeier
Copy link
Member

The required changes have been committed (see f58412b).
Please also note the following commits regarding DcmList.

@jriesmeier jriesmeier closed this Jan 18, 2025
@reunanen reunanen deleted the speed-up-iterating-successive-DcmList-elements-using-seek_to branch January 18, 2025 12:09
@reunanen
Copy link
Author

Thanks! However, don't this and this and this potentially leak memory?

Ok surely one could do something like this at the call site – but I'm afraid it may be a bit too easy to forget to do it?

auto *item = new DcmItem();
if (list.append(item) == nullptr) {
  delete item;
}

For example, it doesn't look like DcmSequenceOfItems::insert or DcmSequenceOfItems::insertAtCurrentPos are doing that at the moment.

On a related note, would it be a practical choice to change the cardinality and the currentPosition members to be size_t, or uint64_t? (And update also invalidListPosition correspondingly, of course.)

@reunanen
Copy link
Author

reunanen commented Jan 18, 2025

(And yes I am aware that my original solution would have blatantly failed when inserting 0xffffffff or more items...)

@jriesmeier
Copy link
Member

jriesmeier commented Jan 18, 2025

Yes, I agree that now there are potential memory leaks when the list is "full", which is not very likely but also not impossible. In general, the caller should take care of freeing the memory because he allocated it. Currently, the return value of append(), prepend() and insert() does not seem to be tested by the callers. However, I think before this commit it was worse, as there was no check on the maximum size of the list at all, which would probably have resulted in "strange" behavior.

Using size_t instead of unsigned long would minimize the probability that the list is full (at least on some systems), but it would also require to check where e.g. DcmList::card() is used and whether these methods actually expect a size_t value.

You see, it would require more investigation and probably more changes in other classes. Of course, if you like, you could investigate this in more detail and come up with a proposal. We would really appreciate this.

@reunanen
Copy link
Author

In general, the caller should take care of freeing the memory because he allocated it.

I agree that this is a good design principle, but I don't think DcmList in general follows it? (See e.g. the destructor?)

@jriesmeier
Copy link
Member

jriesmeier commented Jan 18, 2025

In general, the caller should take care of freeing the memory because he allocated it.

I agree that this is a good design principle, but I don't think DcmList in general follows it? (See e.g. the destructor?)

OK, maybe not in general, but in case of error (at least for DcmList). That the destructor frees the memory makes sense, since there is no other data structure that records the created items (and thus the allocated memory). Also, append(), prepend() and insert() should not delete the passed item (DcmObject*) because the callers currently do not check the return value of these methods at all.

You have to understand that DcmList was "designed" and implemented back in 1994 by a student (not me ;-), so it is probably not the best part of the DCMTK, but one of the oldest. The oldest part of the DCMTK is the ANSI C code in dcmnet, which was developed for the first DICOM demonstration in 1993.

@reunanen
Copy link
Author

Yeah, I'm not criticizing, just trying to understand where we are and how we could (realistically) be in an even better place.

As you wrote: "it's never too late to fix a real problem". :)

@jriesmeier
Copy link
Member

Yes, that's totally true :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants